Skip to content

[WIP] Add dynamic scheduling operation mode#271

Open
mtreinish wants to merge 19 commits into
mainfrom
dynamic-schedule
Open

[WIP] Add dynamic scheduling operation mode#271
mtreinish wants to merge 19 commits into
mainfrom
dynamic-schedule

Conversation

@mtreinish

Copy link
Copy Markdown
Owner

This commit adds an opt-in scheduler option for dynamic scheduling.
Instead of partitioning the test list up-front based on historical
timing data this commit lets each worker ask for the next test
dynamically. This is built using python's multiprocess module to
launch new workers instead of shelling out to call python via
subprocess.

This hopefully will provide a better worker balance since we will keep
each worker occupied until there are no more tests to be run. Instead
of trying to pack fill each work optimially up front. Additionally this
should hopefully improve the pdb story for users who use pdb with tests.
Since instead of spawning subprocesses calling python to invoke the
subunit runner and reading the subunit stream from stdout and instead
uses multiprocessing to fork workers and uses pipes to pass the subunit
streams between workers.

@mtreinish

Copy link
Copy Markdown
Owner Author

This still fails some tests but after switching away from subunit.run this seems to run reliably on python >=3.5. It also will need proper test coverage since this flow is very different from what was there before. I probably should add a dynamic job to travis and appveyor to split out running the full suite between the old way and this new approach.

Comment thread stestr/commands/run.py Outdated

@aspiers aspiers left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for accepting my suggestion. As per this comment this doesn't seem to work for me yet.

@coveralls

coveralls commented Oct 29, 2019

Copy link
Copy Markdown

Coverage Status

Coverage decreased (-21.2%) to 46.985% when pulling e8989ca on dynamic-schedule into bbc839f on master.

mtreinish and others added 4 commits November 24, 2019 06:17
This commit adds an opt-in scheduler option for dynamic scheduling.
Instead of partitioning the test list up-front based on historical
timing data this commit lets each worker ask for the next test
dynamically.  This is built using python's multiprocess module to
launch new workers instead of shelling out to call python via
subprocess.

This hopefully will provide a better worker balance since we will keep
each worker occupied until there are no more tests to be run. Instead
of trying to pack fill each work optimially up front. Additionally this
should hopefully improve the pdb story for users who use pdb with tests.
Since instead of spawning subprocesses calling python to invoke the
subunit runner and reading the subunit stream from stdout and instead
uses multiprocessing to fork workers and uses pipes to pass the subunit
streams between workers.
Co-Authored-By: Adam Spiers <github@adamspiers.org>
This commit fixes the failing tests by catching a couple of missing
things from the update. The biggest fix was that for the --no-discover
case we still use a subprocess and because of that we need to tell
output.ReturnCodeToSubunit to that the input is not dynamic (and
therefore a Popen object) so it can handle that properly. The other
major change is that the return code tests are updated so that the
stdout and stderr from the subprocess calls are always decoded in the
non-subunit test cases. This was done primarily for ease of debugging,
but it also enabled the removal of several decode() calls when the
output is parsed.
This is a refinement on the previous commit to reduce unecessary changes
to the functional tests in the test_return_codes module. Mainly always
decoding the output from the subprocess for testing broken things
unexpectedly when a bytes object was expected.
I originally developed this feature when we still supported older
python versions in stestr. The dynamic scheduling feature depends on
functionality added in Python 3.5. Since then the WIP feature branch sat
stale for years since that time we've bumped the minimum version of
Python supported to 3.7 so the runtime check for older python versions
is no longer needed.
@codecov-commenter

codecov-commenter commented Jul 14, 2023

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 19.80198% with 81 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.74%. Comparing base (20ec64d) to head (71e8eb0).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
stestr/scheduler.py 2.63% 37 Missing ⚠️
stestr/test_processor.py 27.90% 30 Missing and 1 partial ⚠️
stestr/output.py 30.76% 7 Missing and 2 partials ⚠️
stestr/commands/run.py 42.85% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #271      +/-   ##
==========================================
- Coverage   61.42%   59.74%   -1.68%     
==========================================
  Files          30       30              
  Lines        2613     2703      +90     
  Branches      404      421      +17     
==========================================
+ Hits         1605     1615      +10     
- Misses        889      964      +75     
- Partials      119      124       +5     
Flag Coverage Δ
unittests 59.74% <19.80%> (-1.68%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This commit fixes an issue that occured in earlier commits on the PR
around the initialization of the worker processes and the scope of the
launch method. Previously if the method used to launch threads returned
before all the workers accessed the queue for the first time the worker
wouldn't be able to read from the queue. This race condition was caused
because the Queue was locally scoped to the method and would be deleted
by the main process before other workers could read it. This would
specifically occur on systems using "forkserver" or "spawn"
multiprocessing start methods because the child processes didn't have
the queue object, while "fork" would because the process memory was
copied in the child process. This commit fixes this by scoping the Queue
object to the instance which means it survives as long as the test
processor object does (which is typically the entire run command).

As part of this change the start method used by the new dynamic
scheduler is set to be fixed to "spawn" to minimize any potential
interactions between stestr and the code under test. This mirrors the
behavior of running in non-dynamic scheduler mode, because spawn is
roughly equivalent to calling python in a subprocess.
This commit improves the documentation of the new --dynamic flag to
explain how it operates and what the goal of it is. It also makes it
clear the feature is experimental and is an opt-in at your own risk.
Also from testing this doesn't currently work on Windows, instead of
blocking the feature over a platform used by 2-3% of our users
(according to https://pypistats.org/packages/stestr ) this just marks it
as currently unsupported. We will have to revisit how to make this work
on Windows before we stabilize the feature.
@thomasgoirand

Copy link
Copy Markdown

Under Python 3.13, I get:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/stestr/output.py", line 183, in __del__
    self.proc.join()
AttributeError: 'dict' object has no attribute 'join'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=6, pipe_handle=103)
                                                  ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/lib/python3.13/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
  File "/usr/lib/python3.13/multiprocessing/synchronize.py", line 115, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants